Method and a device for coding audio signals and a method and a device for decoding a bit stream

ABSTRACT

The present invention permits a combination of a scalable audio coder with the TNS technique. In a method for coding time signals sampled in a first sampling rate, second time signals are first generated whose sampling rate is smaller than the first sampling rate. The second time signals are then coded according to a first coding algorithm and written into a bit stream. The coded second time signals are, however, decoded again, and, like the first time signals, transformed into the frequency domain. From a spectral representation of the first time signals, TNS prediction coefficients are calculated. The transformed output signal of the coder/decoder with the first coding algorithm, like the spectral representation of the first time signal, undergoes a prediction over the frequency to obtain residual spectral values for both signals, though only the prediction coefficients calculated on the basis of the first time signals are used. These two signals are evaluated against each other. The evaluated residual spectral values are then coded by means of a second coding algorithm to obtain coded evaluated residual spectral values, which, together with the side information containing the calculated prediction coefficients, are written into the bit stream.

FIELD OF THE INVENTION

The present invention relates to scalable audio coders and audiodecoders and in particular to scalable coders and decoders for which atleast one stage operates in the frequency domain.

BACKGROUND OF THE INVENTION AN DESCRIPTION OF PRIOR ART

Scalable audio coders are coders of modular design. An effort istherefore made to use already existing speech coders, which processsignals which e.g. are sampled with 8 kHz and produce data rates of e.g.4.8 to 8 kilobits per second. These known coders, such as e.g. thecoders G.729, G.723, FS1016, CELP or parametric models for MPEG-4-Audio,which are known to persons skilled in the art, serve primarily forcoding speech signals and are not generally suitable for coding higherquality music signals since they are normally designed for signalssampled with 8 kHz, so that they can only code an audio bandwidth of 4kHz at the most. In general, however, they exhibit a low sampling rateand good quality for speech signals.

For the audio coding of music signals, e.g. to achieve HIFI quality orCD quality, with a scalable coder a speech coder is therefore combinedwith an audio coder, which can code signals with higher sampling rates,e.g. 48 kHz. Obviously it is also possible to replace the speech codercited above by another coder, e.g. by a music/audio coder according tothe Standards MPEG1, MPEG2 or MPEG4.

A chain circuit of this kind comprises a speech coder and a higherquality audio coder. An input signal, having a sampling rate of 48 kHze.g., is converted by means of a downsampling filter to the appropriatesampling frequency for the speech coder. The sampling rate could,however, also be the same in both coders. The converted signal is thencoded. The coded signal can be supplied directly to a bit streamformatting device for transmission. However, it only contains signalswith a bandwidth of e.g. 4 kHz at the most. The coded signal is alsodecoded again and converted by means of an upsampling filter. Because ofthe downsampling filter, however, the signal now obtained only containsuseful information with a bandwidth of e.g. 4 kHz. In addition it mustbe recorded that the spectral content of the converted coded/decodedsignal in the lower band to 4 kHz does not correspond exactly to thefirst 4 kHz band of the input signal sampled with 48 kHz since ingeneral coders introduce coding errors.

As has already been mentioned, a scalable coder comprises a generallyknown speech coder and an audio coder which can process signals withhigher sampling rates. To be able to transmit signal components of theinput signal which have frequencies above 4 kHz, the difference betweenthe input signal at 8 kHz and the coded/decoded converted output signalof the speech coder is formed for each individual discrete-time sampledvalue. This difference can then be quantized and coded using a knownaudio coder, as is known to persons skilled in the art. It should bepointed out here that, apart from coding errors, the difference signalwhich is fed to the audio coder, which can code signals with highersampling rates, is essentially zero in the lower frequency range. In thespectral range lying above the bandwidth of the upward convertedcoded/decoded output signal of the speech coder, the difference signalsubstantially corresponds to the true input signal at 48 kHz.

In the first stage, i.e. the speech coder stage, a coder with lowsampling frequency is therefore generally used, since in general a verylow bit rate of the coded signal is aimed at. At the present time anumber of coders, including the cited coders, work with bit rates of afew kilobits (two to 8 kilobits or also more). Furthermore, these enablea maximum sampling frequency of 8 kHz, since more audio bandwidth is notpossible anyway at this low bit rate and the coding at low samplingfrequency is more advantageous as regards the computational effort. Themaximum possible audio bandwidth is 4 kHz and in practice it isrestricted to about 3.5 kHz. If a bandwidth improvement is to beachieved in the further stage, i.e. in the stage with the audio coder,this further stage must work with a higher sampling frequency.

The use of the so-called TNS technique in high quality audio coding tofurther reduce the amount of data has been known for some time (J.Herre, J. D. Johnston, “Enhancing the Performance of Perceptual AudioCoders by Using Temporal Noise Shaping (TNS)”, 101st AES Convention, LosAngeles 1996, Preprint 4384). The TNS technique (TNS=Temporal NoiseShaping), generally speaking, permits temporal shaping of the finestructure of the quantization noise by means of a predictive coding ofthe spectral values. The TNS technique is based on a consistentapplication of the dualism between the time domain and the frequencydomain. In the technical field it is known that when the autocorrelationfunction of a time signal is transformed into the frequency domain itgives the spectral power density of this very time signal. The dual casehereto results when the autocorrelation function of the spectrum of asignal is formed and transformed into the time domain. Theautocorrelation function transformed into or back into the time domainis also called the square of the Hilbert envelope curve of the timesignal. The Hilbert envelope curve of a signal is thus connecteddirectly with the autocorrelation function of its spectrum. The squaredHilbert envelope curve of a signal and the spectral power density of thesame thus represent dual aspects in the time domain and in the frequencydomain. If the Hilbert envelope curve of a signal remains constant foreach partial bandpass signal over a range of frequencies, then theautocorrelation between neighbouring spectral values will also beconstant. This means in fact that the series of spectral coefficients isstationary versus frequency, so that predictive coding techniques can beused efficiently to represent this signal and this, furthermore, byusing a common set of prediction coefficients.

To clarify the situation, reference is made to FIG. 6A and FIG. 6B. FIG.6A shows a short section of a temporally strongly transient “castanet”signal with a duration of about 40 ms. This signal was decomposed into amultiplicity of partial bandpass signals, each partial bandpass signalhaving a bandwidth of 500 Hz. FIG. 6B now shows the Hilbert envelopecurves for these bandpass signals with middle frequencies ranging from1500 Hz to 4000 Hz. To make things clearer, all the envelope curves havebeen normalized to their maximum amplitude. Clearly the shapes of allthe single envelope curves are very similar to one another, which is whya common predictor can be used within this frequency range to code thesignal efficiently. Similar observations can be made for speech signalsin which the effect of the glottal excitation pulses is present over thewhole frequency range because of the nature of the human speechgeneration mechanism.

FIG. 6B thus shows that the correlation of neighbouring values e.g. at afrequency of 2000 Hz is similar to that at a frequency of e.g. 3000 Hzor 1000 Hz.

Alternatively, the property of spectral predictability of transientsignals can be understood by considering the table shown in FIG. 5. Atthe top left of the table a continuous time signal u(t) is shown in theform of a sine wave. Next to this is the spectrum U(f) of this signal,consisting of a single Dirac pulse. The optimal coding of this signalconsists in the coding of spectral data or spectral values since, forthe complete time signal, only the magnitude and the phase of theFourier coefficient have to be transmitted here in order to be able toreconstruct the time signal completely. A coding of spectral datacorresponds at the same time to a prediction in the time domain. Apredictive coding would thus have to take place here in the time domain.The sinusoidal time signal thus has a flat temporal envelope curve,which corresponds to a maximally non-flat envelope curve in thefrequency domain.

The opposite case will now be considered in which the time signal u(t)is a maximally transient signal in the form of a Dirac pulse in the timedomain. A Dirac pulse in the time domain corresponds to a “flat” powerspectrum, while the phase spectrum rotates according to the timeposition of the pulse. It is clear that this signal poses a problem forthe traditional methods cited above, such as e.g. the transform codingor coding of spectral data or a linear prediction coding of the timedomain data. This signal can be coded best and most effectively in thetime domain, since only the temporal position and the power of the Diracpulse have to be transmitted, which, through consistent use of thedualism, means that a predictive coding in the frequency domain alsoconstitutes a suitable method for efficient coding.

It is very important not to confuse the predictive coding of spectralcoefficients over the frequency with the known dual concept of theprediction of spectral coefficients from one block to the next, whichhas already been implemented and is also described in the article citedabove (M. Bosi, K. Brandenburg, S. Quakenbusch, L. Fiedler, K. Akagiri,H. Fuchs, M. Dietz, J. Herre, G. Davidson, Yoshiaki Oikawa: “ISO/IECMPEG-2 Advanced Audio Coding”, 101st AES Convention, Los Angeles 1996,Preprint 4382). In the prediction of spectral coefficients from oneblock to the next, which corresponds to a prediction over the time, thespectral resolution is increased, whereas a prediction of spectralcoefficients over the frequency increases the temporal resolution. Aspectral coefficient at e.g. 1000 Hz can therefore be determined fromthe spectral coefficient at e.g. 900 Hz in the same block or frame.

The above considerations thus led to the attainment of an efficientcoding method for transient signals. Predictive coding techniques can,taking account of the duality between time and frequency domain,substantially be handled analogously to the already known predictionfrom a spectral coefficient to the spectral coefficient with the samefrequency in the next block. Since the spectral power density and thesquared Hilbert envelope curve of a signal are dual to each other, areduction in a residual signal energy or a prediction gain is obtainedwhich depends on the measure of flatness of the squared envelope curveof the signal and not on the spectral measure of flatness as in theconventional prediction method. The potential coding gain increases asthe signals become more transient.

Both the prediction scheme with closed loop, also known as backwardprediction, and the prediction scheme with open loop, also known asforward prediction, offer themselves as possible prediction schemes. Inthe case of the spectral prediction scheme with closed loop (backwardprediction) the envelope curve of the error is flat. Expresseddifferently, the error signal energy is distributed uniformly over thetime.

In the case of a forward prediction, however, as shown in FIG. 7, thereis a temporal shaping of the noise introduced by quantization. Aspectral coefficient to be predicted x(f) is fed to a summation point600. The same spectral coefficient is also fed to a predictor 610, theoutput signal of which is also fed, with negative sign, to the summationpoint 600. The input signal to a quantizer 620 thus represents thedifference between the spectral value x(f) and the spectral valuex_(p)(f) calculated by prediction. For forward prediction the totalerror energy in the decoded spectral coefficient data will remainconstant. The temporal shape of the quantization error signal will,however, appear temporally shaped at the output of the decoder since theprediction was applied to the spectral coefficients, whereby thequantization noise is temporally placed under the actual signal and canthus be masked in this way problems of temporal masking e.g. fortransient signals or speech signals are avoided.

This type of predictive coding of spectral values is thus called the TNSor temporal noise shaping technique. For clarification of this techniquereference is made to FIG. 8A. At the top left in FIG. 8A the temporalbehaviour of a strongly transient time signal is shown. Shown oppositethis temporal behaviour curve at the top right in FIG. 8A is the sectionof a DCT spectrum. The graph at the bottom left in FIG. 8A shows theresulting frequency response of a TNS synthesis filter which wascalculated by the LPC operation (LPC=Linear Prediction Coding). Itshould be noted that the (normalized) frequency coordinates in thisdiagram correspond to the time coordinates due to the time domain andfrequency domain dualism. The LPC calculation obviously produces a“source model” of the input signal, since the frequency response of theLPC-calculated synthesis filter resembles the envelope curve of thestrongly transient time signal. A representation of the residualspectral values, i.e. of the input signal of the quantizer 620 in FIG.7, over the frequency is shown at the bottom right in FIG. 8A. Acomparison of the residual spectral values after prediction and thespectral values obtained with direct time-frequency transform shows thatthe residual spectral values have far less energy than the originalspectral values. In the example shown the reduction in the energy of theresidual spectral values corresponds to a total prediction gain of about12 dB.

The following points should be noted in connection with the graph at thebottom left in FIG. 8A. For classical use of prediction on time domainsignals, the frequency response of the synthesis filter is anapproximation of the magnitude spectrum of the input signal. Thesynthesis filter (re)generates to some extent the spectral shape of thesignal from a residual signal with an approximately “white” spectrum.When prediction is used on spectral signals as in the case of the TNStechnique, the frequency response of the synthesis filter is anapproximation of the envelope curve of the input filter. The frequencyresponse of the synthesis filter is not the result of the Fouriertransform of the pulse response, as in the classical case, but theresult of the inverse Fourier transform. The TNS synthesis filter(re)generates so to speak the form of the envelope curve of the signalfrom a residual signal with an approximately “white” (i.e. flat)envelope curve. Thus the graph at the bottom left in FIG. 8A shows theinput signal envelope curve as modelled by the TNS synthesis filter.This is here a logarithmic representation of the envelope curveapproximation of the castanet signal shown in the figure above it.

Subsequently a coding noise was introduced into the residual spectralvalues such that a signal/noise ratio of about 13 dB resulted in eachcoding band with a width of e.g. 0.5 Bark. The error signals in the timedomain resulting from the introduction of the quantization noise areshown in FIG. 8B. The left-hand diagram in FIG. 8B shows the errorsignal due to the quantization noise when using the TNS technique, whilein the diagram on the right the TNS technique has not been used, thusproviding a comparison. As expected, the error signal in the left-handdiagram is not distributed evenly over the block but is concentrated inthe region in which there is a higher signal content, which willoptimally mask this quantization noise in the right-hand case, on theother hand, the introduced quantization noise is distributed evenly overthe block, i.e. over the time, with the result that in the region at thefront where there is no signal, or scarcely so, noise is also present,which will be audible, while in the region in which there is a highsignal content there is relatively little noise, through which themasking possibilities of the signal are not completely exploited.

A simple, i.e. non-scalable, audio coder with a TNS filter is describedin the following.

An implementation of a TNS filter 804 in a coder is shown in FIG. 9A.This filter is arranged between an analysis filter bank 802 and aquantizer 806. The discrete-time input signal for the coder shown inFIG. 9A is fed into an audio input 800, while the quantized audiosignal, i.e. the quantized spectral values, or the quantized residualspectral values are output at an output 808, which may be followed by aredundancy coder. The input signal is therefore transformed intospectral values. On the basis of the calculated spectral values a normallinear prediction computation is performed, e.g. by forming theautocorrelation matrix of the spectral values and using aLevinson-Durbin recursion. FIG. 9B shows a detailed view of the TNSfilter 804. The spectral values x(1), . . . , x(i), . . . , x(n) are fedin at a filter input 810. It may happen that only a certain frequencyrange exhibits transient signals, whereas another frequency range isprimarily of a stationary nature. This fact is allowed for in the TNSfilter 804 through an input switch 812 and an output switch 814, thoughthe primary function of these switches is to provide parallel-to-serialor serial-to-parallel conversion of the data to be processed.

If a certain frequency range is unsteady and promises a certain codinggain through the TNS technique, then only this spectral range is TNSprocessed, this being achieved in that the input switch 812 starts atthe spectral value x(i) for example and sweeps through to the spectralvalue x(i+2) for instance. The inner region of the filter againcomprises the forward prediction structure, i.e. the predictor 610 andthe summation point 600.

The calculation for determining the filter coefficients of the TNSfilter or for determining the prediction coefficients is performed asfollows. The formation of the autocorrelation matrix and the applicationof the Levinson-Durbin recursion is performed for the highestpermissible order of the noise shaping filter, e.g. 20. If thecalculated prediction gain exceeds a certain threshold, TNS processingis activated.

The order of the employed noise shaping filter for the current block isthen determined by subsequent removal of all coefficients with asufficiently small absolute value from the end of the coefficient array.This results in the orders of TNS filters having values which normallylie in the range 4-12 for a speech signal.

If a sufficiently high coding gain is determined for a range of spectralvalues x(i) for example, this range is processed and the residualspectral value x_(R)(i) appears instead of the spectral value x(i) atthe output of the TNS filter. This residual value has a much smalleramplitude than the original spectral value x(i), as can be seen fromFIG. 8A. In addition to the normal side information, the sideinformation transmitted to the decoder thus contains a flag showing theuse of TNS and, if necessary, information on the destination frequencyrange and on the TNS filter which was used for coding. The filter datacan be represented as quantized filter coefficients.

In analogy to the coder with TNS filter, a decoder having an inverse TNSfilter will now be considered.

In the decoder, which is sketched in FIG. 10A, a TNS coding is reversedfor each channel. Residual spectral values x_(R)(i) are requantized inthe inverse quantizer 216 and fed into an inverse TNS filter 900 whoseconstruction is shown in more detail in FIG. 10B. As output signal theinverse TNS filter 900 delivers spectral values again, which aretransformed into the time domain in a synthesis filter bank 218. The TNSfilter 900 includes an input switch 902 and an output switch 908, whichagain serve chiefly to provide parallel-to-serial or serial-to-parallelconversion of the data to be processed. The input switch 902 also takesaccount of a possible destination frequency range so as to subject onlyresidual spectral values to inverse TNS coding whereas spectral valueswhich are not TNS coded are allowed to pass through unchanged to anoutput 910. The inverse prediction filter comprises a predictor 906 anda summation point 904. In contrast to the TNS filter, however, these areconnected as follows. A residual spectral value arrives via the inputswitch 902 at the summation point 904, where it is summed together withthe output signal of the predictor 906. As output signal the predictorsupplies an estimated spectral value x_(p)(i). The spectral value x(i)is output at the output of the inverse TNS filter via the output switch.The TNS-related side information is thus decoded in the decoder, theside information including a flag indicating the use of TNS and, ifnecessary, information concerning the destination frequency range. Inaddition, the side information contains the filter coefficients of theprediction filter which was used to code a block or “frame”.

The TNS method may thus be summarized as follows. An input signal istransformed into a spectral representation by means of a high-resolutionanalysis filter bank. A linear prediction is then performed in thefrequency domain between spectral values which are neighbours as regardsfrequency. This linear prediction can be interpreted as a filter processfor filtering the spectral values which is performed in the spectraldomain. In this way the original spectral values are replaced by theprediction errors, i.e. by the residual spectral values. These residualspectral values, quantized and coded just like normal spectral values,are transferred to the decoder, where the values are decoded andinversely quantized. Before using the inverse filter bank (synthesisfilter bank) an inverse prediction, inverse that is to the predictioncarried out in the coder, is performed in that the inverse predictionfilter is employed on the transmitted prediction error signal, i.e. onthe requantized residual spectral values.

By employing this technique it is possible to match the temporalenvelope curve of the quantization noise to that of the input signal.This permits better exploitation of the masking of the error signals forsignals with a pronounced temporal fine structure or a pronouncedtransient behaviour. In the case of transient signals the TNS techniqueavoids the so-called “pre-echos”, for which the quantization noisealready appears prior to the “attack” of such a signal.

As has already been mentioned, in a scalable audio coder a coder with alow sampling frequency is employed in the first stage since a very lowbit rate of the coded signal is generally sought. In the second stagethere is preferably an audio coder, which codes at higher bit rates butrequires a much larger bandwidth and can thus code audio signals withmuch higher sound quality than the speech coder can. Normally an audiosignal which is to be coded and which has resulted at a high samplingrate is first down-converted to a lower sampling rate, e.g. by using adownsampling filter. The reduced sampling rate signal is then fed intothe coder of the first stage, the output signal of this coder beingwritten directly into the bit stream which emerges from the scalableaudio coder. This coded signal with lower bandwidth is decoded again andis then brought back to the high sampling rate, e.g. by using anupsampling filter, and is then transformed into the frequency domain.Also transformed into the frequency domain is the audio signaloriginally present at the input of the coder. Two audio signals are nowavailable, the first of them suffering from the coding errors of thecoder of the first stage, however. These two signals in the frequencydomain can now be fed to a difference element to obtain a signal whichrepresents only the difference between the two signals. In a switchingmodule, which can also be implemented as a frequency selective switch,as is described later, it is possible to determine whether it is bettersubsequently to process the difference between the two input signals or,instead, to process directly the original audio signal transformed intothe frequency domain. In any case the output signal of the switchingmodule is fed to a known quantizer/coder, for example, which, if itfunctions according to an MPEG standard, performs both a quantizationtaking account of a psychoacoustic model and then subsequently anentropy coding, preferably using Huffman coding with the quantizedspectral values. The output signal of the quantizer and coder is writteninto the bit stream together with the output signal of the coder of thefirst stage. At first sight it may seem to be a good idea to place theTNS filter described at the beginning directly behind the switchingmodule, i.e. in front of the quantizer/coder, in order to simply imitatethe structure shown in FIG. 10A. A disadvantage of this solution,however, is that the output signal of the switching module is greatlychanged in relation to the original temporal audio signal at the inputof the coder, with the result that a filter coefficient determinationfor the TNS filter is not applicable with the same quality.

SUMMARY OF THE INVENTION

It is the object of the present invention to combine the concept ofscalable audio coding and the concept of temporal noise shaping so as tobenefit from temporal noise shaping in the case of scalable audio codersas well.

In accordance, with a first embodiment of the present invention, thisobject is achieved by a method for coding discrete first time signalswhich have been sampled with a first sampling rate, comprising thefollowing steps: generating second time signals, whose bandwidthcorresponds to a second sampling rate, from the first time signals, thesecond sampling rate being equal to or less than the first samplingrate; coding the second time signals according to a first codingalgorithm to obtain coded second signals; decoding the coded secondsignals according to the first coding algorithm to obtain coded/decodedsecond time signals whose bandwidth corresponds to the second samplingfrequency; transforming the first time signals into the frequency domainto obtain first spectral values; calculating prediction coefficientsfrom the first spectral values; generating second spectral values fromcoded/decoded second time signals, the second spectral values being arepresentation of the coded/decoded second time signals in the frequencydomain; evaluating the first spectral values with the second spectralvalues to obtain evaluated spectral values whose number corresponds tothe number of the first spectral values; performing a prediction of theevaluated spectral values over the frequency by means of the calculatedprediction coefficients to obtain evaluated residual spectral values;and coding the evaluated residual spectral values according to a secondcoding algorithm to obtain coded evaluated residual spectral values.

In accordance with a second embodiment of the present invention, thisobject is achieved by a method for coding discrete first time signalswhich have been sampled with a first sampling rate, comprising thefollowing steps: generating second time signals, whose bandwidthcorresponds to a second sampling rate, from the first time signals, thesecond sampling rate being equal to or less than the first samplingrate; coding the second time signals according to a first codingalgorithm to obtain coded second signals; decoding the coded secondsignals according to the first coding algorithm to obtain coded/decodedsecond time signals whose bandwidth corresponds to the second samplingfrequency; transforming the first time signals into the frequency domainto obtain first spectral values; calculating prediction coefficientsfrom the first spectral values; generating second spectral values fromcoded/decoded second time signals, the second spectral values being arepresentation of the coded/decoded second time signals in the frequencydomain; performing a prediction of the first spectral values and thesecond spectral values over the frequency to obtain first residualspectral values and second residual spectral values, using thecalculated prediction coefficients; evaluating the first residualspectral values with the second residual spectral values to obtainevaluated residual spectral values whose number corresponds to thenumber of the first spectral values; and coding the evaluated residualspectral values according to a second coding algorithm to obtain codedevaluated residual spectral values.

In accordance with a third embodiment of the present invention, thisobject is achieved by a method for decoding a bit stream whichrepresents an audio signal, where the bit stream has signals codedaccording to a first coding algorithm, signals coded according to asecond coding algorithm, and side information, where the signals codedaccording to the second coding algorithm have coded residual spectralvalues, where the residual spectral values are generated from evaluatedspectral values by prediction over the frequency, where predictioncoefficients of the prediction are present in the side information,comprising the following steps: decoding the coded signals which havebeen coded according to the first coding algorithm to obtaincoded/decoded second time signals by means of the first codingalgorithm; decoding the coded residual spectral values by means of thesecond coding algorithm to obtain the residual spectral values;transforming the coded/decoded second time signals into the frequencydomain to obtain the second spectral values; performing an inverseprediction with the evaluated residual spectral values using theprediction coefficients which are present in the side information toobtain the evaluated spectral values; inversely evaluating the evaluatedspectral values and the second spectral values to obtain the firstspectral values; and transforming the first spectral values back intothe time domain to obtain first time signals.

In accordance with a fourth embodiment of the present invention, thisobject is achieved by a method for decoding a bit stream whichrepresents an audio signal, where the bit stream has signals codedaccording to a first coding algorithm, signals coded according to asecond coding algorithm, and side information, where the signals codedaccording to the second coding algorithm have coded residual spectralvalues, where the residual spectral values are generated from evaluatedspectral values by prediction over the frequency, where predictioncoefficients of the prediction are present in the side information,comprising the following steps: decoding the coded signals which havebeen coded according to the first coding algorithm to obtaincoded/decoded second time signals by means of the first codingalgorithm; decoding the coded residual spectral values by means of thesecond coding algorithm to obtain the residual spectral values;transforming the coded/decoded second time signals into the frequencydomain to obtain the second spectral values; performing a predictionwith the second spectral values using the prediction coefficients whichare present in the side information to obtain second residual spectralvalues; inversely evaluating the evaluated residual spectral values andthe second residual spectral values to obtain the residual spectralvalues; performing an inverse prediction with the residual spectralvalues using the prediction coefficients which are stored in the sideinformation to obtain first spectral values; and transforming the firstspectral values back into the time domain to obtain first time signals.

In accordance with a fifth embodiment of the present invention, thisobject is achieved by an apparatus for coding discrete first timesignals which have been sampled with a first sampling rate, comprising:a device for generating second time signals, whose bandwidth correspondsto a second sampling rate, from the first time signals, the secondsampling rate being equal to or less than the first sampling rate; afirst coder for coding the second time signals according to a firstcoding algorithm to obtain coded second signals; a decoder for decodingthe coded second signals according to the first coding algorithm toobtain coded/decoded second time signals whose bandwidth corresponds tothe second sampling frequency; a transformer for transforming the firsttime signals into the frequency domain to obtain first spectral values;a calculator for calculating prediction coefficients from the firstspectral values; a device for generating second spectral values fromcoded/decoded second time signals, the second spectral values being arepresentation of the coded/decoded second time signals in the frequencydomain; a device for evaluating the first spectral values with thesecond spectral values to obtain evaluated spectral values whose numbercorresponds to the number of the first spectral values; a predictor forperforming a prediction of the evaluated spectral values over thefrequency by means of the calculated prediction coefficients to obtainevaluated residual spectral values; and a second coder for coding theevaluated residual spectral values according to a second codingalgorithm to obtain coded evaluated residual spectral values.

In accordance with a sixth embodiment of the present invention, thisobject is achieved by an apparatus for coding discrete first timesignals which have been sampled with a first sampling rate, comprising:a device for generating second time signals, whose bandwidth correspondsto a second sampling rate, from the first time signals, the secondsampling rate being equal to or less than the first sampling rate; afirst coder for coding the second time signals according to a firstcoding algorithm to obtain coded second signals; a decoder for decodingthe coded second signals according to the first coding algorithm toobtain coded/decoded second time signals whose bandwidth corresponds tothe second sampling frequency; a transformer for transforming the firsttime signals into the frequency domain to obtain first spectral values;a calculator for calculating prediction coefficients from the firstspectral values; a device for generating second spectral values fromcoded/decoded second time signals, the second spectral values being arepresentation of the coded/decoded second time signals in the frequencydomain; a predictor for performing a prediction of the first spectralvalues and the second spectral values over the frequency to obtain firstresidual spectral values and second residual spectral values, using thecalculated prediction coefficients; a device for evaluating the firstresidual spectral values with the second residual spectral values toobtain evaluated residual spectral values whose number corresponds tothe number of the first spectral values; and a second coder for codingthe evaluated residual spectral values according to a second codingalgorithm to obtain coded evaluated residual spectral values.

In accordance with a seventh embodiment of the present invention, thisobject is achieved by an apparatus for decoding a bit stream whichrepresents an audio signal, where the bit stream has signals codedaccording to a first coding algorithm, signals coded according to asecond coding algorithm, and side information, where the signals codedaccording to the second coding algorithm have coded residual spectralvalues, where the residual spectral values are generated from evaluatedspectral values by prediction over the frequency, where predictioncoefficients of the prediction are present in the side information,comprising: a first decoder for decoding the coded signals which havebeen coded according to the first coding algorithm to obtaincoded/decoded second time signals by means of the first codingalgorithm; a second decoder for decoding the coded residual spectralvalues by means of the second coding algorithm to obtain the residualspectral values; a transformer for transforming the coded/decoded secondtime signals into the frequency domain to obtain the second spectralvalues; an inverse predictor for performing an inverse prediction withthe evaluated residual spectral values using the prediction coefficientswhich are present in the side information to obtain the evaluatedspectral values; a device for inversely evaluating the evaluatedspectral values and the second spectral values to obtain the firstspectral values; and an inverse transformer for transforming the firstspectral values back into the time domain to obtain first time signals.

In accordance with an eight embodiment of the present invention, thisobject is achieved by an apparatus for decoding a bit stream whichrepresents an audio signal, where the bit stream has signals codedaccording to a first coding algorithm, signals coded according to asecond coding algorithm, and side information, where the signals codedaccording to the second coding algorithm have coded residual spectralvalues, where the residual spectral values are generated from evaluatedspectral values by prediction over the frequency, where predictioncoefficients of the prediction are present in the side information,comprising: a first decoder for decoding the coded signals which havebeen coded according to the first coding algorithm to obtaincoded/decoded second time signals by means of the first codingalgorithm; a second decoder for decoding the coded residual spectralvalues by means of the second coding algorithm to obtain the residualspectral values; a transformer for transforming the coded/decoded secondtime signals into the frequency domain to obtain the second spectralvalues; a predictor for performing a prediction with the second spectralvalues using the prediction coefficients which are present in the sideinformation to obtain second residual spectral values; a device forinversely evaluating the evaluated residual spectral values and thesecond residual spectral values to obtain the residual spectral values;an inverse predictor for performing an inverse prediction with theresidual spectral values using the prediction coefficients which arestored in the side information to obtain first spectral values; and aninverse transformer for transforming the firs t spectral values backinto the time domain to obtain first time signals.

The present invention is based on the finding that the determination ofthe TNS filter coefficients or prediction coefficients must be performedon the basis of spectral values which are not affected by the coder ofthe first stage. A scalable audio coder should, of course, also be aflexible coder which, as coder of the first stage, can utilize one ofthe variants cited in the introduction to the description. According tothe present invention the determination of the TNS predictioncoefficients is performed on the basis of spectral values which are adirect representation of the audio signal at the input of the coder. Byemploying a filter bank or an MDCT a spectral representation of theaudio coder input signal can be created. However, it is now no longerpossible to perform the determination of the TNS filter coefficients atthe same place in the coder as the actual filtering by the TNS codingfilter. The determination of the TNS filter coefficients must thereforetake place separately from the actual TNS coding filtering.

According to a first aspect of the present invention the determinationof the TNS filter coefficients is performed directly behind the filterbank which transforms the original audio input signal into the frequencydomain. Thus signals of the same type, namely signals which have notbeen TNS processed, are present in front of the summer or the switchingmodule. According to the first aspect of the present invention the TNSfiltering with the already determined TNS coefficients takes placebehind the switching module and in front of the quantizer/coder, whichmight operate according to the psychoacoustic model. As will be apparentlater, this implementation of the TNS technique in the scalable audiocoder involves a modification of the decoder, however.

According to a second aspect of the present invention this decoding isno longer necessary, however. Here TNS prediction coefficients are againdetermined at the same place as for the first aspect. In contrast to thefirst aspect of the present invention, the two relevant spectralsignals, i.e. the spectral signal with the coding error of the firststage and the spectral signal which is an essentially undistortedrepresentation of the audio input signal, are processed by the TNScoding filter, which operates with the previously determined TNScoefficients, in front of the summing element. It is important to notethat the TNS filtering of the spectral signal which bears the codingerror of the coder of the first stage works without redetermination ofthe TNS coefficients simply using the TNS coefficients derived from theerror-free audio signal. According to the second aspect of the presentinvention two signals of the same type, here TNS-processed signals, areagain present at the input of the summer or the switching module.

Generally speaking, the first and second aspects of the presentinvention differ in that in one instance signals which are notTNS-processed are present in front of the summer whereas in anotherinstance TNS-processed signals are subjected to differencing or are fedinto the switching module.

The cited conditions are taken into account in the decoders according tothe present invention. In the case of a decoder which decodes a signalcoded according to the first aspect of the present invention, the TNSdecoding, i.e. the use of the TNS decoding filter employing the TNScoefficients determined when coding, which appear again as sideinformation in the bit stream, takes place in front of an inverseswitching module which is analogous to the switching module. As for thecoder, the inverse switching module is thus supplied with signals whichhave not been TNS processed in the case of the decoder as well.

In the case of a decoder which decodes coded signals according to thesecond aspect of the present invention, on the other hand, the inverseswitching module is fed with TNS-processed signals. To this end thedecoded signal of the coder of the first stage must be converted intothe frequency domain and filtered by means of a TNS coding filter whichuses the TNS filter coefficients determined in the coder. Only then aresignals of the same kind, namely TNS-processed signals, compared in theinverse switching module or the adder arranged in front of it, as wasthe case in principle for the coder according to the second aspect ofthe present invention. The output signals of the inverse switchingmodule are finally fed into a TNS decoding filter, the output signals ofwhich are then processed by an inverse filter bank so as to reproducethe original audio signal apart from the coding errors of the wholearrangement. As has already been mentioned, the coder or decoderaccording to the second aspect of the present invention is preferredamong the embodiments according to the present invention since nosubstantial modifications are necessary in the decoder as the TNSdecoding filter or the inverse TNS filter is arranged in front of theinverse filter bank, which corresponds to the arrangement in FIG. 10A.

BRIEF DESCRIPTION OF THE DRAWINGS

Preferred embodiments of the present invention will be described in moredetail below, making reference to the enclosed drawings in which

FIG. 1 shows a scalable audio coder according to a first aspect of thepresent invention;

FIG. 2 shows a scalable audio coder according to a second aspect of thepresent invention;

FIG. 3 shows a decoder according to the first aspect of the presentinvention;

FIG. 4 shows a decoder according to the second aspect of the presentinvention;

FIG. 5 shows a table illuminating the duality between the time domainand the frequency domain;

FIG. 6A shows an example for a transient signal;

FIG. 6B shows Hilbert envelope curves of partial bandpass signals on thebasis of the transient time signal shown in FIG. 6A;

FIG. 7 shows a schematic representation of the prediction in thefrequency domain;

FIG. 8A shows an example for illustrating the TNS technique;

FIG. 8B shows a comparison of the temporal behaviour of an inducedquantization noise with (left) and without (right) the TNS technique;

FIG. 9A shows a simplified block diagram of an unscalable coder with aTNS filter;

FIG. 9B shows a detailed diagram of the TNS filter of FIG. 9A;

FIG. 10A shows a simplified block diagram of an unscalable decoder withan inverse TNS filter; and

FIG. 10B shows a detailed diagram of the inverse TNS filter of FIG. 10A.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 1 shows a schematic block diagram of a scalable audio coderaccording to the present invention. A discrete time signal x₁ which hasbeen sampled with a first sampling rate, e.g. 48 kHz, is brought to asecond sampling rate, e.g. 8 kHz, by means of a downsampling filter 12,the second sampling rate being lower than the first sampling rate. Theratio of the first and second sampling rates is preferably a wholenumber. The output signal of the downsampling filter 12, which may beimplemented as a decimation filter, is fed into a coder/decoder 14,which codes its input signal according to a first coding algorithm. Thecoder/decoder 14 can, as has already been mentioned, be a speech coderof lower order, such as e.g. a coder G.729, G.723, FS1016, MPEG-4 CELP,MPEG-4 PAR. Such coders operate at data rates of 4.8 kilobits per second(FS1016) up to data rates of about 8 kilobits per second (G.729). Theyall process signals, which have been sampled with a sampling frequencyof 8 kHz. For persons skilled in the art, however, it is obvious thatany other coders with other data rates or other sampling frequencies canalso be employed.

The signal coded by the coder 14, i.e. the coded second signal x_(2c), abit stream which is dependent on the coder 14 and which is present atone of the cited bit rates, is fed into a bit formatter 18 via a line16. The function of the bit formatter 18 will be described later. Thedownsampling filter 12 and the coder/decoder 14 constitute a first stageof the scalable audio coder according to the present invention.

The coded second signals x_(2c) which are output to the line 16 are alsodecoded again in the first coder/decoder 14 so as to generatecoded/decoded second time signals x_(2cd) on a line 20. Thecoded/decoded second time signals x_(2cd) are discrete-time signalswhich have a lower bandwidth than the first discrete time signals x₁.For the cited numerical example the first discrete time signal x₁ has amaximum bandwidth of 24 kHz since the sampling frequency is 48 kHz. Thecoded/decoded second time signals x_(2cd) have a maximum bandwidth of 4kHz since the downsampling filter 12 has converted the first time signalx₁ to a sampling frequency of 8 kHz through decimation. Within thebandwidth of 0 to 4 kHz the signals x₁ and x_(2cd) are the same apartfrom the coding errors introduced by the coder/decoder 14.

It should be pointed out here that the coding errors introduced by thecoder 14 are not always small errors but that they may well be of thesame order of magnitude as the useful signal, e.g. when a stronglytransient signal is coded in the first coder. For this reason a check ismade to see whether a difference coding makes any sense, as is explainedlater.

The signal x_(2cd) at the output of the coder/decoder 14 is fed into anupsampling filter 23 to convert it back to the high sampling rate againso that it can be compared with the signal x₁.

The upsampled signal x_(2cd) and the signal x₁ are respectively fed intoa filter bank FB1 22 and a filter bank FB2 24. The filter bank FB1 22generates spectral values X_(2cd) which are a frequency domainrepresentation of the signals x_(2cd). The filter bank FB2 on the otherhand generates spectral values X₁ which are a frequency domainrepresentation of the original first time signal x₁. The output signalsof the two filter banks are subtracted in a summer 26. More precisely,the output spectral values X_(2cd) of the filter bank FB1 22 aresubtracted from the output spectral values of the filter bank FB2 24.The summer 26 is followed by a switching module SM 28, which has asinputs both the output signal X_(d) of the summer 26 and the outputsignal X₁ of the filter bank 24, i.e. the spectral representation of thefirst time signals, which will hereafter be called first spectral valuesX₁.

According to a first aspect of the present invention, predictioncoefficients for a TNS filter or for a prediction filter 27 whichfollows the switching module 28 are calculated by means of a device 25for calculating the TNS coefficients. The TNS coefficients calculator 25feeds the coefficients both to the TNS coding filter 27 and to the bitformatter 18, as can be seen from FIG. 1.

The TNS coding filter feeds a quantizer/coder 30, which performs aquantization according to a psychoacoustic model, symbolized by apsychoacoustic module 32, as is known to persons skilled in the art. Thetwo filter banks 22, 24, the summer 26, the switching module 28, thequantizer/coder 30 and the psychoacoustic module 32 constitute a secondstage of the scalable audio coder according to the present invention.

In the following the operation of the scalable audio coder is explainedmaking use of FIG. 1. As has already been said, the discrete first timesignals x₁, which have been sampled with a first sampling rate, are fedinto the downsampling filter 12 to generate second time signals x₂ whosebandwidth corresponds to a second sampling rate, the second samplingrate being lower than the first sampling rate. From these second timesignals x₂ the coder/decoder 14 generates second coded time signalsx_(2c) according to a first coding algorithm and, by means of asubsequent decoding according to the first coding algorithm,coded/decoded second time signals x_(2cd). The coded/decoded second timesignals X_(2cd) are transformed into the frequency domain by the firstfilter bank FB1 22 to generate second spectral values X_(2cd) which area frequency domain representation of the coded/decoded second timesignals x_(2cd).

It should be pointed out here that the coded/decoded second time signalsx_(2cd) are time signals with the second sampling frequency, i.e. 8 kHzin the example. The frequency domain representation of these signals andthe first spectral values X₁ should now be evaluated, the first spectralvalues X₁ being generated from the first time signal x₁, which exhibitsthe first, i.e. high, sampling frequency, by means of the second filterbank FB2 24. In order to obtain comparable signals with an identicaltime and frequency resolution, the 8 kHz signal, i.e. the signal withthe second sampling frequency, must be converted into a signal with thefirst sampling frequency. For the scalable coder it is not, however,imperative that the two sampling frequencies should be different, theycan also have the same value.

Instead of using the upsampling filter, this can also be achieved byinserting a certain number of zero values between the individualdiscrete-time scanned values of the signal x_(2cd). The number of zerovalues is given by (the ratio of first sampling frequency to secondsampling frequency) −1. The ratio of the first (high) to the second(low) sampling frequency is called the upsampling factor. As is known topersons skilled in the art, the insertion of zeros, which is possiblewith very low computing effort, creates an aliasing effect in the signalx_(2cd), as a result of which the low-frequency or useful spectrum ofthe signal x_(2cd) is repeated, the number of repetitions being equal tothe number of zeros inserted. The aliasing-afflicted signal x_(2cd) isnow transformed into the frequency domain by the first filter bank FB1so as to generate second spectral values X_(2cd).

The insertion of e.g. five zeros between each scanned value of thecoded/decoded second signal x_(2cd) results in a signal for which it isknown from the start that only every sixth scanned value of this signaldiffers from zero. This fact can be exploited when transforming thissignal into the frequency domain by means of a filter bank or MDCT or bymeans of an arbitrary Fourier transform, since it is possible e.g. todispense with certain summations which arise for a simple FFT. Theknown-from-the-start structure of the signal to be transformed can thusbe employed in an advantageous manner to save computing time whentransforming the signal into the frequency domain.

The second spectral values X_(2cd) are a correct representation of thecoded/decoded second time signal x_(2cd) only in the lower part, forwhich reason only the 1/(upsampling factor) part of all the spectrallines X_(2cd) are used at the output of the filter bank FB1. It shouldbe pointed out here that, due to the insertion of zeros in thecoded/decoded second time signal x_(2cd), the number of spectral linesX_(2cd) which are used now has the same time and frequency resolution asthe first spectral values X₁, which is a frequency representationwithout aliasing disturbance of the first time signal x₁. In thesubtracter 26 and in the switching module 28 the two signals X_(2cd) andX₁ are evaluated so as to generate evaluated spectral values X_(b) orX₁. The switching module 28 now executes a so-called simulcastdifference switchover.

It is not always advantageous to employ a difference coding in thesecond stage. This is the case, for example, when the difference signal,i.e. the output signal of the summer 26, has a higher energy than theoutput signal of the second filter bank X₁. Since, moreover, anarbitrary coder can be used for the coder/decoder 14 of the first stage,the coder may produce certain signals which are difficult to code. Thecoder/decoder 14 should preferably preserve phase information of thesignal which it has coded, a process which experts call “wave formcoding” or “signal form coding”. The decision in the switching module 28of the second stage as to whether a difference coding or a simulcastcoding is to be used is made on the basis of the frequency.

“Difference coding” means that only the difference between the secondspectral values X_(2cd) and the first spectral values X₁ is coded. Ifsuch difference coding is not advantageous, however, since the energycontent of the difference signal is greater than the energy content ofthe first spectral values X₁, difference coding is not employed. Ifdifference coding is not employed, the first spectral values X₁ of thetime signal x₁, which is sampled with 48 kHz in the example, areswitched through by the switching module 28 and used as output signal ofthe switching module SM 28.

Since the difference formation takes place in the frequency domain,there is no problem in making a frequency selective choice betweensimulcast and difference coding since the difference between the twosignals X₁ and X_(2cd) is calculated in any case. The differenceformation in the spectrum thus permits a simple frequency selectivechoice of the frequency ranges which should be difference coded. Inprinciple there could be a changeover from a difference to a simulcastcoding for each spectral value individually. This requires too great anamount of side information, however, and is not absolutely necessary. Itis thus better e.g. to compare the energies of the difference spectralvalues and of the first spectral values in frequency groups.Alternatively, certain frequency bands can be specified from the start,e.g. eight bands, each of width 500 Hz, which again results in thebandwidth of the signal X_(2cd) if the time signal x₂ has a bandwidth of4 kHz. A compromise when stipulating the frequency bands consists inbalancing the amount of side information to be transmitted, i.e. whetherthe difference coding is or is not active in a frequency band, againstthe benefit accruing from difference coding as frequently as possible.

Side information, e.g. 8 bits per band, an on/off bit for the differencecoding or some other suitable coding, can be transmitted in the bitstream, showing whether a particular frequency band is difference codedor not. In the decoder, which will be described later, only thecorresponding subbands of the first coder are then added duringreconstruction.

A step of evaluating the first spectral values X₁ and the secondspectral values X_(2cd) thus preferably comprises the subtraction of thesecond spectral values X_(2cd) from the first spectral values X₁ so asto obtain difference spectral values X_(d). Also, the energies of amultiplicity of spectral values in a predetermined band, e.g. 500 Hz inthe 8 kHz example, are then calculated for the difference spectralvalues X_(d) and for the first spectral values X₁ in a known manner,e.g. by summing and squaring. A frequency selective comparison of therespective energies is now performed in each frequency band. If theenergy in a particular frequency band of the difference spectral valuesX_(d) exceeds the energy of the first spectral values X₁ multiplied by apredetermined factor k, it is decided that the evaluated spectral valuesX_(b) are the first spectral values X₁. Otherwise it is decided that thedifference spectral values X_(d) are the evaluated spectral values X₁.The factor k can range e.g. from about 0.1 to 10. For values of k lessthan 1 a simulcast coding is already employed when the difference signalhas a smaller energy than the original signal. For values of k greaterthan 1, on the other hand, a difference coding continues to be used,even when the energy content of the difference signal is already largerthan that of the original signal not coded in the first coder. If asimulcast coding is evaluated, the switching module 28 will switchthrough the output signals of the second filter bank 24 directly. As analternative to the difference formation which has been described, anevaluation can also be performed in which e.g. a ratio is formed or amultiplication or some other operation is performed on the two citedsignals.

The TNS coding filter 27, which is connected to the output of theswitching module 28, now performs a prediction of the evaluated spectralvalues X_(b) over the frequency using the prediction coefficientsevaluated by the TNS coefficients calculator 25 so as to obtainevaluated residual spectral values.

The evaluated residual spectral values, which correspond either to thedifference spectral values X_(d) or to the first spectral values X₁, asdetermined by the switching module 28, are now quantized by a firstquantizer/coder 30 taking account of the psychoacoustic model, which isknown to persons skilled in the art and which is present in thepsychoacoustic model 32, and are then coded, preferably by means of aredundancy reducing coding, e.g. using Huffman tables. As is also knownto persons skilled in the art, the psychoacoustic model is calculatedfrom time signals, which is why the first time signal x₁ with the highsampling rate is fed directly into the psychoacoustic model 32, as canbe seen in FIG. 1. The output signal X_(cb) of the quantizer/coder 30 isrouted directly to the bit formatter 18 on the line 42 and is writteninto the output signal x_(AUS).

A scalable audio coder with a first and a second stage has beendescribed above. The concept of the scalable audio coder according tothe present invention is also capable of cascading more than two stages.Thus it would e.g. be possible with an input signal x₁ which is sampledwith 48 kHz to code the first 4 kHz of the spectrum in the firstcoder/decoder 14 by reducing the sampling rate to achieve a signalquality after decoding which corresponds roughly to the speech qualityof telephone calls. In the second stage a bandwidth coding of up to 12kHz could be performed, implemented by the quantizer/coder 30, toachieve a tone quality which corresponds roughly to HIFI quality. It isobvious to persons skilled in the art that a signal x₁ which is sampledwith 48 kHz can have a bandwidth of 24 kHz. The third stage, implementedby the additional quantizer/coder 38, could now perform a coding up to abandwidth of max. 24 kHz or, for a practical example, up to e.g. 20 kHz,to achieve a tone quality which corresponds roughly to that of a compactdisk (CD).

Apart from the side information which must also be transmitted, thecoded data stream x_(AUS) comprises the following signals:

the coded second signals x_(2c) (full spectrum from 0 to 4 kHz); and

the coded evaluated residual spectral values (full spectrum from 0 to 12kHz for a simulcast coding or coding errors from 0 to 4 kHz of the coder14 and full spectrum from 4 to 12 kHz for a difference coding).

It is possible that in the transition from the first coder/decoder 14 tothe quantizer/coder 30 in the example transition disturbances mayaccompany the transition from 4 kHz to a value greater than 4 kHz. Thesetransition disturbances may manifest themselves in erroneous spectralvalues which are written into the bit stream x_(AUS). The totalcoder/decoder can now be so specified that e.g. only the frequency linesup to 1/(upsampling factor minus x) (x=1, 2, 3) are used. As a resultthe last spectral lines of the signal X_(2cd) at the end of the maximumbandwidth attainable with the second sampling frequency are notconsidered. Implicitly this means that an evaluation function isemployed which in the cited case is a rectangular function which is zeroabove a certain frequency value and which has a value of 1 below this.Alternatively a “softer” evaluation function can also be employed, whichreduces the amplitude of spectral lines which have transitiondisturbances, after which the spectral lines of reduced amplitude arethen considered.

It should be noted that the transition disturbances are not audiblesince they are eliminated again in the decoder. The transitiondisturbances can, however, lead to excessive difference signals forwhich the coding gain due to the difference coding is then reduced. Byevaluation with an evaluation function such as that described above, theloss in coding gain can therefore be kept within limits. An evaluationfunction other than the rectangular function will not require anyadditional side information since, like the rectangular function, it canbe agreed a priori for the coder and decoder.

FIG. 2 shows a practical implementation of a coder which works accordingto the second aspect of the present invention. The same elements as inFIG. 1 bear the same reference numerals and, unless specifically statedotherwise, fulfil the same functions. As has already been explained, thesecond aspect of the present invention is better for the decoder sinceit requires less modifications to be made. In contrast to the scalableaudio coder of FIG. 1, in FIG. 2 a second TNS coding filter 27 islocated behind the filter bank 1 of position 22. Furthermore, the firstTNS coding filter is already located behind the filter bank 2 24, whichmeans that the summers in the device 26 and the switching module 28process TNS-processed spectral values, namely first residual spectralvalues and second residual spectral values. In the switching module 28and the summer 26 the first residual spectral values are thus evaluatedwith the second residual spectral values to obtain evaluated residualspectral values, which are then fed into the quantizer/coder 30. Thusthis quantizes and codes evaluated residual spectral values, as in FIG.1. The TNS coefficients calculator 25 feeds both the TNS coder behindthe filter bank 24 and the TNS coder behind the filter bank 22, theoutput signal of the filter bank 22 being subjected to TNS filtering,however, which is performed on the basis of the TNS coefficients whichhave been calculated from the output signal of the filter bank 24. As inFIG. 1, the TNS coefficients of the bit stream formatter 18 are suppliedas side information.

FIG. 3 shows a decoder for decoding the data coded by the scalable audiocoder shown in FIG. 1. The output data stream of the bit formatter 18 ofFIG. 1 is fed to a demultiplexer 46 to obtain the signals on the lines42 and 16 of FIG. 1 from the data stream x_(AUS). The coded secondsignals x_(2C) are fed into a delay element 48, the delay element 48introducing a delay into the data which may be necessary on account ofother aspects of the system and which forms no part of the presentinvention.

After the delay the coded second signals x_(2c) are fed into a decoder50, which decodes by means of the first coding algorithm, which is alsoimplemented in the coder/decoder 14 of FIG. 1, so as to generate thecoded/decoded second time signal x_(cd2), which can be output via a line52, as shown in FIG. 3. The coded evaluated residual spectral values arerequantized by means of a requantizer 54 to obtain the evaluatedresidual spectral values. A summer 58 forms the sum of the residualspectral values and the residual spectral values of an optional furtherlayer (shown dashed).

To create the same conditions again in front of a summer 62, which worksanalogously to the summer 26, the summer 58 is followed by a TNSdecoding filter 59. The TNS decoding filter 59 performs an inverse TNSfiltering with the output signal of the summer 58. Here the predictioncoefficients which are contained in the side information are used, thesehaving been calculated by the TNS coefficients calculator 25 of FIG. 2.At the output of the TNS decoder 59 are the decoded evaluated spectralvalues X_(b).

It should be pointed out here that, as can be seen from FIG. 3, thecoded/decoded second time signal must first be converted by means of asuitable upsampling filter 63 and transformed into the frequency domainby means of a filter bank 64 to obtain the second spectral valuesX_(2cd) since the summation of the summer 62 is a summation of spectralvalues. The filter bank 64 is preferably identical to the filter banksFB1 22 and FB2 24, whereby only one device has to be implemented, which,equipped with suitable buffers, is supplied with different signals insuccession. Alternatively, different filter banks, provided they aresuitable, may be used.

As has already been mentioned, information which is used in thequantization of spectral values is derived from the first time signal x₁by means of the psychoacoustic module 32. A particular effort is made toquantize the spectral values as coarsely as possible to minimize theamount of data for transmission. On the other hand, disturbancesintroduced by the quantization should not be audible. A model which isknown per se and which is contained in the psychoacoustic module 32 isused to calculate an allowed disturbance energy which can be introducedby the quantization without any disturbance being audible. A control nowcontrols the quantizer in a known quantizer/coder to perform aquantization which introduces a quantization disturbance which issmaller or equal to the allowed disturbance. This is constantlymonitored in known systems in that the signal quantized by thequantizer, which is contained in e.g. block 30, is dequantized again. Bycomparing the input signal in the quantizer with thequantized/dequantized signal the disturbance energy actually introducedby the quantization is calculated. The actual disturbance energy of thequantized/dequantized signal is compared in the control with the alloweddisturbance energy. If the actual disturbance energy is greater than theallowed disturbance energy, the control in the quantizer will increasethe fineness of the quantization. The comparison between the allowed andactual disturbance energy typically takes place per psychoacousticfrequency band. This method is known and is used by the scalable audiocoder according to the present invention if simulcast coding isemployed.

A so-called post-filter 67, which can perform certain post-filterings ofthe output signal of the decoder, which corresponds to the decoder ofthe first stage, is located at the output of the decoder 50. This filterdoes not constitute any part of the present invention, however.

FIG. 4 shows a decoder similar to that in FIG. 3. However the decodershown in FIG. 4 works for signals which have been coded according to thesecond aspect of the present invention. In contrast to FIG. 3 theinverse switching module 60 works with TNS coded input signals, whereasthe inverse switching module 60 of FIG. 3 works with non-TNS-processedinput signals, i.e. TNS-decoded signals. Since the output signal of thedecoder 50 was not TNS-coded anywhere, not even in the coder, it must befiltered by a TNS coding filter 27, which can be implemented in the sameway as the TNS coding filters 27 of FIG. 1 and FIG. 2. In the decoderaccording to the second aspect of the present invention the concludingTNS decoding filter 59 is located directly in front of the inversefilter bank 66, which can reverse the filter bank operations of thefilter banks 22 and 24. This arrangement is preferred since itcorresponds to the arrangement shown in FIG. 10A, which can normally befound in transform coders. Both the TNS decoding filter 59 and the TNScoding filter 27 are supplied with prediction coefficients, which thedemultiplexer 46 extracts from the side information of the coded bitstream x_(aus).

The additional TNS coding filter 27 in the decoder according to FIG. 4represents only a minimally higher outlay, since the parametersascertained during the TNS filter parameter determination aretransmitted in any case so as to be able to calculate the TNS decodingfilter. The same are also sufficient to calculate the TNS coding filterin the decoder. No change is needed in the transmitted bit stream.

For persons skilled in the art it is obvious that the example which hasbeen presented, in which the first sampling frequency is 48 kHz and thesecond sampling frequency is 8 kHz, is simply exemplary. A smallerfrequency than 8 kHz may also be used as the second lower samplingfrequency. As sampling frequencies for the whole system the followingfrequencies may be used: 48 kHz, 44.1 kHz, 32 kHz, 24 kHz, 22.05 kHz, 16kHz, 8 kHz or some other suitable sampling frequency. The bit rate rangeof the coder/decoder 14 of the first stage can, as already mentioned,range from 4.8 kbits per second up to 8 kbits per second. The bit raterange of the second coder in the second stage can range from 0 to 64,69.659, 96, 128, 192 and 256 kbits per second at sampling rates of 48,44.1, 32, 24, 16 and 8 kHz. The bit rate range of the coder in the thirdstage can range from 8 kbits per second to 448 kbits per second for allsampling rates.

What is claimed is:
 1. A method for coding discrete first time signalswhich have been sampled with a first sampling rate, comprising thefollowing steps: generating second time signals, whose bandwidthcorresponds to a second sampling rate, from the first time signals, thesecond sampling rate being equal to or less than the first samplingrate; coding the second time signals according to a first codingalgorithm to obtain coded second signals; decoding the coded secondsignals according to the first coding algorithm to obtain coded/decodedsecond time signals whose bandwidth corresponds to the second samplingfrequency; transforming the first time signals into the frequency domainto obtain first spectral values; calculating prediction coefficientsfrom the first spectral values; generating second spectral values fromcoded/decoded second time signals, the second spectral values being arepresentation of the coded/decoded second time signals in the frequencydomain; evaluating the first spectral values with the second spectralvalues to obtain evaluated spectral values whose number corresponds tothe number of the first spectral values; performing a prediction of theevaluated spectral values over the frequency by means of the calculatedprediction coefficients to obtain evaluated residual spectral values;and coding the evaluated residual spectral values according to a secondcoding algorithm to obtain coded evaluated residual spectral values. 2.A method for coding discrete first time signals which have been sampledwith a first sampling rate, comprising the following steps: generatingsecond time signals, whose bandwidth corresponds to a second samplingrate, from the first time signals, the second sampling rate being equalto or less than the first sampling rate; coding the second time signalsaccording to a first coding algorithm to obtain coded second signals;decoding the coded second signals according to the first codingalgorithm to obtain coded/decoded second time signals whose bandwidthcorresponds to the second sampling frequency; transforming the firsttime signals into the frequency domain to obtain first spectral values;calculating prediction coefficients from the first spectral values;generating second spectral values from coded/decoded second timesignals, the second spectral values being a representation of thecoded/decoded second time signals in the frequency domain; performing aprediction of the first spectral values and the second spectral valuesover the frequency to obtain first residual spectral values and secondresidual spectral values, using the calculated prediction coefficients;evaluating the first residual spectral values with the second residualspectral values to obtain evaluated residual spectral values whosenumber corresponds to the number of the first spectral values; andcoding the evaluated residual spectral values according to a secondcoding algorithm to obtain coded evaluated residual spectral values. 3.A method for decoding a bit stream which represents an audio signal,where the bit stream has signals coded according to a first codingalgorithm, signals coded according to a second coding algorithm, andside information, where the signals coded according to the second codingalgorithm have coded residual spectral values, where the residualspectral values are generated from evaluated spectral values byprediction over the frequency, where prediction coefficients of theprediction are present in the side information, comprising the followingsteps: decoding the coded signals which have been coded according to thefirst coding algorithm to obtain coded/decoded second time signals bymeans of the first coding algorithm; decoding the coded residualspectral values by means of the second coding algorithm to obtain theresidual spectral values; transforming the coded/decoded second timesignals into the frequency domain to obtain the second spectral values;performing an inverse prediction with the evaluated residual spectralvalues using the prediction coefficients which are present in the sideinformation to obtain the evaluated spectral values; inverselyevaluating the evaluated spectral values and the second spectral valuesto obtain the first spectral values; and transforming the first spectralvalues back into the time domain to obtain first time signals.
 4. Amethod for decoding a bit stream which represents an audio signal, wherethe bit stream has signals coded according to a first coding algorithm,signals coded according to a second coding algorithm, and sideinformation, where the signals coded according to the second codingalgorithm have coded residual spectral values, where the residualspectral values are generated from evaluated spectral values byprediction over the frequency, where prediction coefficients of theprediction are present in the side information, comprising the followingsteps: decoding the coded signals which have been coded according to thefirst coding algorithm to obtain coded/decoded second time signals bymeans of the first coding algorithm; decoding the coded residualspectral values by means of the second coding algorithm to obtain theresidual spectral values; transforming the coded/decoded second timesignals into the frequency domain to obtain the second spectral values;performing a prediction with the second spectral values using theprediction coefficients which are present in the side information toobtain second residual spectral values; inversely evaluating theevaluated residual spectral values and the second residual spectralvalues to obtain the residual spectral values; performing an inverseprediction with the residual spectral values using the predictioncoefficients which are stored in the side information to obtain firstspectral values; and transforming the first spectral values back intothe time domain to obtain first time signals.
 5. An apparatus for codingdiscrete first time signals which have been sampled with a firstsampling rate, comprising the following features: a device forgenerating second time signals, whose bandwidth corresponds to a secondsampling rate, from the first time signals, the second sampling ratebeing equal to or less than the first sampling rate; a first coder forcoding the second time signals according to a first coding algorithm toobtain coded second signals; a decoder for decoding the coded secondsignals according to the first coding algorithm to obtain coded/decodedsecond time signals whose bandwidth corresponds to the second samplingfrequency; a transformer for transforming the first time signals intothe frequency domain to obtain first spectral values; a calculator forcalculating prediction coefficients from the first spectral values; adevice for generating second spectral values from coded/decoded secondtime signals, the second spectral values being a representation of thecoded/decoded second time signals in the frequency domain; a device forevaluating the first spectral values with the second spectral values toobtain evaluated spectral values whose number corresponds to the numberof the first spectral values; a predictor for performing a prediction ofthe evaluated spectral values over the frequency by means of thecalculated prediction coefficients to obtain evaluated residual spectralvalues; and a second coder for coding the evaluated residual spectralvalues according to a second coding algorithm to obtain coded evaluatedresidual spectral values.
 6. An apparatus for coding discrete first timesignals which have been sampled with a first sampling rate, comprisingthe following features: a device for generating second time signals,whose bandwidth corresponds to a second sampling rate, from the firsttime signals, the second sampling rate being equal to or less than thefirst sampling rate; a first coder for coding the second time signalsaccording to a first coding algorithm to obtain coded second signals; adecoder for decoding the coded second signals according to the firstcoding algorithm to obtain coded/decoded second time signals whosebandwidth corresponds to the second sampling frequency; a transformerfor transforming the first time signals into the frequency domain toobtain first spectral values; a calculator for calculating predictioncoefficients from the first spectral values; a device for generatingsecond spectral values from coded/decoded second time signals, thesecond spectral values being a representation of the coded/decodedsecond time signals in the frequency domain; a predictor for performinga prediction of the first spectral values and the second spectral valuesover the frequency to obtain first residual spectral values and secondresidual spectral values, using the calculated prediction coefficients;a device for evaluating the first residual spectral values with thesecond residual spectral values to obtain evaluated residual spectralvalues whose number corresponds to the number of the first spectralvalues; and a second coder for coding the evaluated residual spectralvalues according to a second coding algorithm to obtain coded evaluatedresidual spectral values.
 7. An apparatus for decoding a bit streamwhich represents an audio signal, where the bit stream has signals codedaccording to a first coding algorithm, signals coded according to asecond coding algorithm, and side information, where the signals codedaccording to the second coding algorithm have coded residual spectralvalues, where the residual spectral values are generated from evaluatedspectral values by prediction over the frequency, where predictioncoefficients of the prediction are present in the side information,comprising the following features: a first decoder for decoding thecoded signals which have been coded according to the first codingalgorithm to obtain coded/decoded second time signals by means of thefirst coding algorithm; a second decoder for decoding the coded residualspectral values by means of the second coding algorithm to obtain theresidual spectral values; a transformer for transforming thecoded/decoded second time signals into the frequency domain to obtainthe second spectral values; an inverse predictor for performing aninverse prediction with the evaluated residual spectral values using theprediction coefficients which are present in the side information toobtain the evaluated spectral values; a device for inversely evaluatingthe evaluated spectral values and the second spectral values to obtainthe first spectral values; and an inverse transformer for transformingthe first spectral values back into the time domain to obtain first timesignals.
 8. An apparatus for decoding a bit stream which represents anaudio signal, where the bit stream has signals coded according to afirst coding algorithm, signals coded according to a second codingalgorithm, and side information, where the signals coded according tothe second coding algorithm have coded residual spectral values, wherethe residual spectral values are generated from evaluated spectralvalues by prediction over the frequency, where prediction coefficientsof the prediction are present in the side information, comprising thefollowing features: a first decoder for decoding the coded signals whichhave been coded according to the first coding algorithm to obtaincoded/decoded second time signals by means of the first codingalgorithm; a second decoder for decoding the coded residual spectralvalues by means of the second coding algorithm to obtain the residualspectral values; a transformer for transforming the coded/decoded secondtime signals into the frequency domain to obtain the second spectralvalues; a predictor for performing a prediction with the second spectralvalues using the prediction coefficients which are present in the sideinformation to obtain second residual spectral values; a device forinversely evaluating the evaluated residual spectral values and thesecond residual spectral values to obtain the residual spectral values;an inverse predictor for performing an inverse prediction with theresidual spectral values using the prediction coefficients which arestored in the side information to obtain first spectral values; and aninverse transformer for transforming the first spectral values back intothe time domain to obtain first time signals.